Tightly Integrating Relational Learning and Multiple-Instance Regression for Real-Valued Drug Activity Prediction
نویسندگان
چکیده
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the “Score As You Use” (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multipleinstance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.
منابع مشابه
Multiple-Instance Learning of Real-Valued Data
The multiple-instance learning model has received much attention recently with a primary application area being that of drug activity prediction. Most prior work on multiple-instance learning has been for concept learning, yet for drug activity prediction, the label is a real-valued affinity measurement giving the binding strength. We present extensions of k-nearest neighbors (k-NN), Citation-k...
متن کاملMultiple Fuzzy Regression Model for Fuzzy Input-Output Data
A novel approach to the problem of regression modeling for fuzzy input-output data is introduced.In order to estimate the parameters of the model, a distance on the space of interval-valued quantities is employed.By minimizing the sum of squared errors, a class of regression models is derived based on the interval-valued data obtained from the $alpha$-level sets of fuzzy input-output data.Then,...
متن کاملReal-Valued Multiple-Instance Learning with Queries
While there has been a significant amount of theoretical and empirical research on the multiple-instance learning model, most of this research is for concept learning. However, for the important application area of drug discovery, a real-valued classification is preferable. In this paper we initiate a theoretical study of real-valued multiple-instance learning. We prove that the problem of find...
متن کاملSalience Assignment for Multiple-Instance Regression
We present a Multiple-Instance Learning (MIL) algorithm for determining the salience of each item in each bag with respect to the bag’s real-valued label. We use an alternating-projections constrained optimization approach to simultaneously learn a regression model and estimate all salience values. We evaluate this algorithm on a significant real-world problem, crop yield modeling, and demonstr...
متن کاملRelational Instance Based Regression for Relational Reinforcement Learning
The full paper on this topic appears in the Proceedings of the Twentieth International Conference on Machine Learning. [1] Q-learning [6] is a model free approach to tackle reinforcement learning problems which calculates a Qualityor Q-function to represent the learned policy. The Q-function takes a state-action pair as input and outputs a real number which indicates the quality of that action ...
متن کامل